Assignment 1 
Due Date 15 Feb 2005 

Each Question worth 10 points and the assignment worth 8 % of the total marks for the unit. 
Question 1. 

What are the different types of fault-tolerant systems? Provide original examples of different 
types of systems in real life described in the class. Explain briefly your answer. 

There are three main types of fault tolerant systems 

(i) Long Life Systems: 

The most common examples of long-life applications are the unmanned space flight and satellites. 
Typical requirements of a long-life application are to have a 0.95, or greater, probability of being 
operational at the end of ten-year period. Unlike other applications, however, long-life systems can 
often allow extended outage as long as the system can eventually be made operational once again. 
In addition, long-life applications can frequently allow the system to be reconfigured manually by 
the operations. 

(ii) Highly Availability Applications 

Availability is a key parameter in many applications. Banking and other time-shared systems are 
good examples of high availability applications. Users of these systems want to have a high 
probability of receiving service when it is requested. The Tandem Nonstop transaction processing 
system (Katzman, 1977) is a good example of one designed for high availability. 

(iii) Ultra Reliable Systems 

Perhaps the most widely publicized applications of fault tolerance computing are those where the 
computations are critical to human safety, environmental cleanliness or equipment protection. 
Examples include aircraft flight control systems, military systems, and certain types industrial 
controllers. 

Question 2. 

What is the essential difference between long life and highly available systems? 

The essential difference between long life and highly available systems is that long life systems can 
not be maintained / repaired whereas highly available can be repaired. The key concept of long life 
systems is that they are designed to maintain a certain level of reliability over their given / predicted 



mission time. Whereas highly available systems need to work correctly during their operation time 
is due to that in case of downtime. 

A example of this can be seen that it would not be possible to repair the Voyager if it developed a 
fault while in space, whereas it would be possible to repair an ftServer, however the downtime 
arising from this repair would be very costly to a company relying on the hardware to run their 
business. 

Question 3. 

Does software in general exhibit increasing or decreasing failure rates? 

Generally software exhibit decreasing failure rates. Software reliability depends largely on the size 
of the code and its complexities. There is a slight difference between open source and proprietary 
software as bus reports are often filed and quickly repaired for open source software, however large 
scale, proprietary software (such as Microsoft applications) are less dynamic in this sense. 

Question 4: 

What is the essential difference between hardware and software faults? 

Software faults are caused by design error, ambiguities, oversights or misinterpretation of 
specification that the software is supposed to satisfy, carelessness or incompetence in writing code, 
inadequate testing, incorrect or unexpected usage of the software or other unforeseen problems. In 
the other hand hardware faults can be also caused by design error, but are also caused by design 
defects, operating environment and other transitive faults. 

Hardware faults can mainly overcome by redundancy; however software faults are intrinsic to the 
pieces of software and can therefore not be covered by redundancy. 

Question 5. 

Given that a particular device exhibits 1 failure in million hours what is the failure rate? 

The failure rate for a device that fails in 1 million hours is —= 10 6 = 0.000001 

10 6 


Question 6. 

Given 1000 devices described in question 5 interconnected to form a system what is the system 
failure rate. 

The system failure rate of 1000 of these devices interconnected is = —?— = 10 3 = 0.001 

10 6 10 3 



Question 7. 

What is the reliability of above system after 10 hours. 

The reliability R(t ) = e h 

Given: A = 10 3 ,t= 10, therefore R(t) = e - (0mi>x10 = e~ om = 0.99 

Question 8. 

What is the MTTF of the above system? NOTE: There will be an in-class 

= — = - - = 1000 hours 

A 10“ 3 


The MTTF of this system MTTF = J R(t) dt 

o 



